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Abstract 

When we represent a network of sensors in Euclidean space by a graph, 
there are two distances between any two nodes that we may consider. One 
of them is the Euchdean distance. The other is the distance between the 
two nodes in the graph, defined to be the number of edges on a shortest 
path between them. In this paper, we consider a network of sensors placed 
uniformly at random in a two-dimensional region and study two condi- 
tional distributions related to these distances. The first is the probability 
distribution of distances in the graph, conditioned on Euclidean distances; 
the other is the probability density function associated with Euclidean 
distances, conditioned on distances in the graph. We study these distri- 
butions both analytically (when feasible) and by means of simulations. 
To the best of our knowledge, our results constitute the first of their kind 
and open up the possibility of discovering improved solutions to certain 
sensor-network problems, as for example sensor localization. 

Keywords: Sensor networks. Random geometric graphs. Distance distri- 
butions. 



1 Introduction 

We consider a network of n sensors, each one placed at a fixed position in two- 
dimensional space and capable of communicating with another sensor if and 
only if the Euclidean distance between the two is at most R, for some constant 
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radius R > 0. If Sij denotes this distance for sensors i and j, then a graph 
representation of the network can be obtained by letting each sensor be a node 
and creating an edge between any two distinct nodes i and j such that 6ij < R. 
Such a representation is, aside from a scale factor, equivalent to a unit disk 
graph [3. 

Often n is a very large integer and the network is essentially unstructured, 
in the sense that the sensors' positions, although fixed, are generally unknown. 
In domains for which this holds, generalizing the graph representation in such 
a way that each node's position is given by random variables becomes a crucial 
step, since it opens the way to the investigation of relevant distributions related 
to all networks that result from the same deployment process. Such a general- 
ization, which can be done for any number of dimensions, is known as a random 
geometric graph [32] . Similarly to the random graphs of Erdos and Renyi [13] 
and related structures "2^, many important properties of random geometric 
graphs are known, including some related to connectivity and the appearance 
of the giant component [H [21 [3] and others more closely related to applications 

One curious aspect of random geometric graphs is that, if nodes are posi- 
tioned uniformly at random, the expected Euclidean distance between any two 
nodes is a constant in the limit of very large n, depending only on the number 
of dimensions (two, in our case) [5]. In this case, distance-dependent analyses 
must necessarily couple the Euclidean distance with some other type of distance 
between nodes. The natural candidate is the standard graph-theoretic distance 
between two nodes, given by the number of edges on a shortest path between 
them [3. For nodes i and j, this distance is henceforth denoted by dij and 
referred to simply as the distance between i and j. 

Given i and j, the Euclidean distance Sij and the distance dij between the 
two nodes are not independent of each other, but rather interrelate in a complex 
way. Our goal in this paper is to explore the relationship between the two 
when all sensors are positioned uniformly at random in a given two-dimensional 
region. Specifically, for i and j two distinct nodes chosen at random, we study 
the probability that dij ~ d for some integer d > 0, given that dij = S for some 
real number S > 0. Similarly, we also study the probability density associated 
with 6ij = S when dij = d. Our study is analytical whenever feasible, but is also 
computational throughout. Depending on the value of d, we are in a few cases 
capable of providing exact closed-form expressions, but in general what we give 
are approximations, either derived mathematically or inferred from simulation 
data exclusively. 

We remark, before proceeding, that we perceive the study of distance-related 
distributions for random geometric graphs as having great applicability in the 
field of sensor networks, particularly in domains in which it is important for 
each sensor to have a good estimate of its location. In fact, of all possible appli- 
cations that we normally envisage for sensor networks |15j , network localization 
is crucial in all cases that require the sensed data to be tagged with reliable in- 
dications of where the data come from; it has also been shown to be important 
even for routing purposes ^23j. So, although we do not dwell on the issue of 
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network localization anywhere else in the paper, we now digress momentarily to 
clarify what we think the impact of distance-related distributions may be. 

The problem of network localization has been tackled from a variety of per- 
spectives, including rigidity-theoretic studies dHU, approaches that are pri- 
marily algorithmic, either centralized [TH or distributed [TH [311 |^ \T7\ . 
and others that generalize on our assumptions by taking advantage of sensor 
mobility [2ll [27] or uneven radii [22] . In general one assumes the existence of 
some anchor sensors (regularly placed [5] or otherwise), for which positions are 
known precisely, and then the problem becomes reduced to the problem of pro- 
viding, for each of the other sensors, the Euclidean distances that separate it 
from three of the anchors (its tripolar coordinates with respect to those anchors, 
from which the sensor's position can be easily calculated [37]). 

Finding a sensor's Euclidean distance to an anchor is not simple, though. 
Sometimes signal propagation is used for direct or indirect measurement jH 
[20] [33] [TT] [36l l30] , but there are approaches that rely on no such techniques 
[251 [HI [31] • The latter include one of the most successful distributed approaches 
|31| . which nonetheless suffers from increasing lack of accuracy as sparsity or 
irregularity in sensor positioning become more pronounced. The algorithm of 
[5T| assumes, for each anchor i, that each edge on any shortest path to i is 
equivalent to a fixed Euclidean distance, which is estimated by i in commu- 
nication with the other anchors and by simple proportionality can be used by 
any node to infer its Euclidean distance to i. We believe that knowledge of 
distance-related distributions has an important role to play in replacing this as- 
sumption and perhaps dispelling the algorithm's difhculties in the less favorable 
circumstances alluded to above. 

We proceed in the following manner. In Section 2 we give some notation 
and establish the overall approach to be followed when pursuing the analytical 
characterization of distance-related distributions. Then in Sections 3 through 
5 we present the mathematical analysis of the d = 1 through d = 3 cases. We 
continue in Section 6 with computational results related to d > 1 and close in 
Section 7 with some discussion and concluding remarks. 



2 Overall approach 

Let i and j be two distinct, randomly chosen nodes. For d > an integer and 
6 > a real number, we use Ps{d) to denote the probability, conditioned on 
6ij — (5, that dij = d. Likewise, we use pd{S) to denote the probability density, 
conditioned on dij = d, associated with Sij = S. These two quantities relate 
to each other in the standard way of combining integer and continuous random 
variables [35] . 

If we assume that Ps{d) is known for all applicable values of d and 6, then 
it follows from Bayes' theorem that 

Ps{d)p{6) 
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where p{5) is the unconditional probabihty density associated with the occur- 
rence of an EucHdean distance of 5 separating two randomly chosen nodes 
and P(d) is the unconditional probability that the distance between them is 
d. Clearly, P{d) = /^''^ Pr(c')p(?')dr, since Pr{d) = for r > dR. Moreover, 
p{r) is proportional to the circumference of a radius-r circle, 27rr, which yields 

In view of Equation our approach henceforth is to concentrate on calcu- 
lating Ps{d) for all appropriate values of d and S, and then to use the equation 
to obtain Pd{5)- In order to calculate Ps{d)^ we fix two nodes a and b such that 
5ab = 5 and proceed by analyzing how the two radius- i? circles (the one centered 
at a and the one at b) relate to each other. While doing so, we assume that 
the two-dimensional region containing the graph has unit area, so that the area 
of any of its sub-regions automatically gives the probability that it contains a 
randomly chosen node. We assume further that all border effects can be safely 
ignored (but see Section 6 for the computational setup that justifies this). 



3 The distance- 1 and distance-2 cases 

The case of d = 1 is straightforward, since dab = d if and only ii 5 < R. 
Consequently, 

^^^^^ " { oi otheTwfse 



and, by Equation ([2]), 



' \0, otherwise. ^ ' 



For d = 2, we have dab = d li and only if 5 > i? and at least one node 
k exists, with k ^ {a, 6}, such that 5ak < R and 5bk < R- The probability 
that this holds for a randomly chose k is given by the intersection area of the 
radius- -R circles centered at a and 6, here denoted by pg. From [37] . we have 

^ f 2i?2 cos-i {5/2R) - Sy/R^-d^/A, if S < 2R; 
\ 0, otherwise. 

Because any node that is not a or 6 may, independently, belong to such inter- 
section, we have 

P.(2) = (l-(1-^^)"-^' '\i>''-^ (6) 
0, otherwise. 

As for P2{S), it is as given by Equation ([2]), equaling if (5 < i? or (5 > 2i? (we 
remark that a closed-form expression is obtainable also in this case, but it is too 
cumbersome and is for this reason omitted). 
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4 The distance-3 case: exact basis 



The d = 3 case is substantially more complex than its predecessors in Section 3. 
We begin by noting that dab — d ii and only if the following three conditions 
hold: 

CI. d> R. 

C2. No node i exists such that both 6ai < R and 6i,i < R. 

C3. At least one node k ^ {a, b} exists, and for this k at least one node 
£ ^ {a, 6, fc}, such that Sak < R, ^kt ^ 5m < R-, Sai > R, and finally 
Sbk > R- 

For each fixed k and £ in Condition C3, these three conditions result from the 
requirement that nodes a, fc, £, and 6, in this order, constitute a shortest path 
from a to b. 

If we fix some node fc ^ {a,b} for which dak < R and Sf,k > R, the probability 
that Condition C3 is satisfied by k and a randomly chosen € is a function of 
intersection areas of circles that varies from case to case, depending on the value 
of 6. There are two cases to be considered, as illustrated in Figure[TJ In the first 
case, illustrated in part (a) of the figure, R < 6 < 2R and node £ is to be found 
in the intersection of the radius- i? circles centered at 6 and fc, provided it is not 
also in the radius- i? circle centered at a. The intersection area of interest results 
from computing the intersection area of two circles (those centered at b and k) 
and subtracting from it the intersection area of three circles (those centered at 
a, 6, and k). The former of these intersection areas is given as in Equation ([5|), 
with Sbk substituting for d; as for the latter, closed-form expressions also exist, 
as given in [TB]. The second case, shown in part (b) of Figure [TJ is that of 
2R < S < 3R, and then the intersection area of interest is the one of the circles 
centered at b and k. Regardless of which case it is, we use to denote the 
resulting area. Thus, the probability that at least one £ exists for fixed k is 
1 - (1 - al)"-\ 

Now let Pgi^) be the probability that a randomly chosen k satisfies Con- 
dition C3. Let also Kg be the region inside which such a node can be found 
with nonzero probability. If Xk and yk are the Cartesian coordinates of node k, 
then each possible location of k inside Ks contributes to ^^-(3) the infinitesimal 
probability [1 — (1 — ag)"-^^]dxkdyk- It follows that 

i^i(3)-/ [l-il-<jlr-^]dxkdyk. (7) 

JkeKs 

There are three possibilities for the region Ks, shown in parts (a) through (c) 
of Figure [2] as shaded regions, respectively for R < 6 < i?\/3, R^/3 < S < 2R, 
and 2R < S < 3R. The shaded region in part (a) is delimited by four radius- i? 
circles, the ones centered at nodes a (above and below) and b (on the right) 
and the ones centered at points D and E (on the left). As S gets increased 
beyond i?\/3 — and, at the threshold, point D becomes collinear with point B 
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(a) 




(b) 

Figure 1: Regions (shown in shades) whose areas yield the value of tr^ for 
R<d <2R{a) and 2R < S < 3R (b). 

and node b — we move into part (b) of the figure, where the shaded region is 
now delimited on the left either by the radius- i? circles centered at D and E 
or by the radius-2i? circle centered at b, depending on the point of common 
tangent between each of the radius- i? circles and the radius-2i? circle. The next 
threshold leads S beyond 2R, and in part (c) of the figure the shaded region is 
delimited on the left by the radius-2i? circle centered at b, on the right by the 
radius- i? circle centered at a. 

Figured] is also useful in helping us obtain a more operational version of the 
expression for Pg{3), to be used in Section 6. First we establish a Cartesian co- 
ordinate system by placing its origin at node a and making the positive abscissa 
axis go through node b. In this system, the shaded regions in all of parts (a) 
through (c) of the figure are symmetrical with respect to the abscissa axis. If 
for each value of Xk we let y^{xk) and y^{xk) be, respectively, the minimum 
and maximum yk values in the upper half of the shaded region for the value of 
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6 at hand, then 

P'sii) = 2 / / [1 - (1 - a1r~^]dxkdy>,, (8) 

where and bound the possible values of Xk for the given S. 

All pertinent values of and well as of Tjf, (xk) and y^(xfc), are given 

in Table [1] where 6~ and indicate, respectively, the lower and upper limit 
for S in each of the three possible cases. This table's entries make reference to 
the abscissae of points A, B, C, and D (respectively xa, xb, xc, and xd) and 
to the ordinate of point D {ijd)- These are given in Tabled 



5 The distance-3 case: approximate extension 

Obtaining Ps{i) from Pg{i) requires that we fulfill the remaining requirements 
set by Conditions C2 and C3 in Section 4. These are that no node exists in 
the intersection of the radius- i? circles centered at a and h and that at least one 
node k exists with the properties given in Condition C3. While the probability 
of the former requirement is simply (1 — ps)"~^, expressing the probability of 
the latter demands that we make a careful approximation to compensate for the 
lack of independence of certain events with respect to one another. 

For node i ^ {a, b}, let stand for the event that Condition C3 does not hold 
for k — i. Let also Qsiu) be the probability of and Qg the joint probability 
of all n — 2 events. Clearly, Qs{^i) = 1 — P'si^) for any i and, for 6 > R, 
Ps{3) (1 — Qs)i^ — Ps)^~'^- Therefore, if all the n — 2 events were independent 
of one another, we would have 

Qs^ n Qsie,) = [l-Pi{3)r-' (9) 

if{a,b} 

and, consequently, 

" 1 0, otherwise. 

However, once we know of a certain node i that Condition C3 does not hold 
for it, immediately we reassess as less likely that the condition holds for nodes 
in the Euclidean vicinity of i. The n—2 events introduced above are then not 
unconditionally independent of one another, although we do expect whatever 
degree of dependence there is to wane progressively as we move away from node 
i. 

We build on this intuition by postulating the existence of an integer rt' < n—2 
such that the independence of the n' events not only holds but is also sufficient 
to determine Psi-^) as indicated above, provided the corresponding n' nodes are 
picked uniformly at random. But since this is precisely the way in which, by 
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Table 1: Cartesian coordinates delimiting the upper halves of shaded regions in 
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assumption, sensors are positioned, it suffices that any n' nodes be selected, 
yielding 



Similarly to the previous cases, P3((5) is given by Equation ([2]) and equals if 

5 < i? or 5 > 3i?. 

It remains, of course, for the value of n' to be discovered if our postulate is to 
be validated. We have done this empirically, by means of computer simulations, 
as discussed in Section 6. 

6 Computational results 

In this section we present simulation results and, for d ~ 1,2,3, contrast them 
with the analytic predictions of Sections 3 through 5. The latter are obtained 
by numerical integration when a closed-form expression is not available (the 
case of d = 3 also requires simulations for finding n'; see below). For d> 3, we 
demonstrate that good approximations by Gaussians can be obtained. 

We use n = 1 000 and a circular region of unit area, therefore of radius 



i/I/tt k, 0.564, for the placement of nodes. Node a is always placed at the 

circle's center, which has Cartesian coordinates (0,0), and all results refer to 
distances to a. Our choice for the value of i? depends on the expected number 
of neighbors (or connectivity) of a node, which we denote by z and use as 
the main parameter. Since z — irR^n for large n, choosing the value of z 
immediately yields the value of R to be used. We use z = Stt and z — 5tt, 
which yield, respectively, R « 0.055 and R w 0.071. We note that both values 
of z are significantly above the phase transition that gives rise to the giant 
component, which happens at z « 4.52 |11| . In all our experiments, then, 
graphs are connected with high probability. 

For each value of z, each simulation result we present is an average over 10^ 
independent trials. Each trial uses a matrix of accumulators having n — 1 rows 



(one for each of the possible distance values) and 1 OOO-^I/tt columns (one for 




each of the 0.001-wide bins into which Euclidean distances are compartmental- 
ized). A trial consists of: placing n—1 nodes uniformly at random in the circle; 
computing the Euclidean distance between each node and node a; computing 
the distances between each node and node a (this is done with Dijkstra's algo- 
rithm lOj); updating the accumulator that corresponds to each node, given its 
two distances. At the end of each trial, its contributions to Ps{d) and pd{S) are 
computed, with d — 1,2, . . . ,n — 1 and S ranging through the middle points of 
all bins. If M is the matrix of accumulators, then these contributions are given, 
respectively, by M{d, 6)/ Y,d' ^'^i^' , 5) and M(d, (5)/0.001 Y.s' M{d, S'). 

The case of d = 3 requires two additional simulation procedures, one for 
determining simulation data for Pg{3), the other to determine n' for use in 
obtaining analytic predictions for A- (3). The former of these fixes node b at 
coordinates {S, 0) and performs 10^ independent trials. At each trial, two nodes 




ii5> R; 



otherwise. 



(11) 
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are generated uniformly at random in the circle. At the end of all trials, the 
desired probability is computed as the fraction of trials that resulted in nodes 
k and i as in Section 4. 

The simulation for the determination of n' is conducted for S = 2R only, 
whence ps = 0. This is the value of S for which the results from the simulation 
above for Pa (3) and the analytic prediction for 1 — [1 — P^(3)]"~^ differ the most 
(data not shown). Moreover, as we will see shortly, the value of n' we find using 
this value of 5 is good for all other values as well. The simulation is aimed at 
finding the value of Qs and proceeds in 10^ independent trials. Each trial fixes 
node b at (S, 0) and places the remaining n — 2 nodes in the circle uniformly at 
random. The fraction of trials resulting in no node qualifying as the node k of 
Section 4 is the value of Qs- We set n' to be the m < n ~ 2 that minimizes 
\Qs — [1 — Pi(3)]™|, where P^(3) refers to the analytic prediction. Our results 
are n' = 779 for z 37r, n' = 780 for z = Stt. 

Results for d = 1 arc shown in Figure \3\ for d = 2 in Figure HI for d = 3 in 
Figures [5] and [SI and for d > 3 in Figure [T] In all figures, both Ps{d) and Pd{S) 
are plotted against i5, since it seems better to visualize what happens as one 
gets progressively farther from node a in Euclidean terms. For this reason, the 
plots for Ps{d) do not constitute a probability distribution for any fixed value 
of d. 

7 Discussion and conclusion 

The results summarized in Figures [3] through [S] reveal excellent agreement be- 
tween the analytic predictions we derived in Sections 3 through 5 and our sim- 
ulation data. This holds not only for the simple cases of d = 1 and d = 2, 
but also for the considerably more complex cases of Pg{3) and ^^(3). The lat- 
ter, in particular, depends on the empirically determined n' . In this respect, 
it is clear from Figure [6] that, even though n' could have been calculated for a 
greater assortment of S values, doing it exclusively for 5 = 2R seems to have 
been sufficient. 

Figure [7] contemplates some of the d > 3 cases, for which we derived no 
analytic predictions. The values of d that the figure covers in parts (a, b) and 
(c, d), respectively for z — 3tt and z = 5tt, are 4, . . . , 11. Of these, d = 11 for 
z = Stt in part (d) typifies what happens for larger values of d as well (omitted 
for clarity), viz. probability densities sharply concentrated at the border of the 
radius- -^/I/tt circle centered at node a. Note that the same also occurs for 
z = 3tt, but owing to the smaller R it only happens for larger values of d 
(omitted from part (b), again for clarity). 

For 4 < d < 11 with z = Sir, and 4 < d < 9 with z = Stt, Figures E^b) 
and (d) also display Gaussian approximations of pd{S)- Parts (a) and (c) of the 
figure, in turn, contain the corresponding simulation data only, and we remark 
that the absence of some approximation computed from the Gaussians of part 
(b) or (d) is not a matter of difficulty of principle. In fact, the counterpart of 
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Figure 5: ^^(S). Solid lines give the analytic predictions. 



Equation obtained also from Bayes' theorem and such that 



Ps{d) - 



Pd{S)P{d) 
p{S) 



Pd{5)P{d) 

y::zIps{s)p{s) 



(12) 



can in principle be used with either those Gaussians or the concentrated densities 
in place oips{5) as appropriate for each s. What prevents this, however, is that 
we lack a characterization of P{s) that is not based on simulation data only. 

Still in regard to Figure [71 one possible interpretation of the good fit by 
Gaussians of the simulation data for pdi.5) comes from resorting to the central 
limit theorem in its classical form |35j . In order to do this, we view 5 as valuing 
the random variable representing the average Euclidean distance to node a of all 
nodes that are d edges apart from a. The emergence of Pd[5) as a Gaussian for 
d > 3 (provided d is small enough that the circle's border is not influential) may 
then indicate that, for each value of d, the Euclidean distances of those nodes 
to node a are independent, identically distributed random variables. While we 
know that this does not hold for the smaller values of d as a consequence of 
the uniformly random positioning of the nodes in the circle (smaller Euclidean 
distances to a are less likely to occur for the same value of d) , it would appear 
that it begins to hold as d is increased. 

To summarize, we have considered a network of sensors placed uniformly 
at random in a two-dimensional region and, for its representation as a ran- 
dom geometric graph, have studied two distance-related distributions. One of 
them is the probability distribution of distances between two randomly chosen 
nodes, conditioned on the Euclidean distance between them. The other is the 
probability density function associated with the Euclidean distance between two 
randomly chosen nodes, given the distance between them. We have provided an- 
alytical characterizations whenever possible, in the simplest cases as closed-form 
expressions, and have also validated these predictions through simulations. 
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Figure 6: P5(3) (a) a,nd ps{5) (b). Solid lines give the analytic predictions. 
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Figure 7: Ps{d) and Pd{S) for d > 3, with z — 3tt (a, b) and z — 5tt (c, d). Solid 
lines give the Gaussians that best fit some of the pd{S) data, each of mean n 
and standard deviation a as indicated. 
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Figure 7: Continued. 
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While further work related to additional analytical characterizations is worth 
undertaking, as is the investigation of the three-dimensional case, we find that 
the most promising tracks for future investigation are those that relate to ap- 
plications. In Section 1 wc illustrated this possibility in the context of sensor 
localization, for which it seems that understanding the distance-related distri- 
butions we have studied has the potential to help in the discovery of better 
distributed algorithms. Whether there will be success on this front remains to 
be seen, as well as whether other applications will be found with the potential 
to benefit from the results we have presented. 
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